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We present a novel stochastic Hebb-like learning rule for neural networks. This learning rule is 
stochastic with respect to the selection of the time points when a synaptic modification is induced 
by pre- and postsynaptic activation. Moreover, the learning rule does not only affect the synapse 
between pre- and postsynaptic neuron which is called homosynaptic plasticity but also on further 
remote synapses of the pre- and postsynaptic neuron. This form of plasticity has recently come 
into the light of interest of experimental investigations and is called heterosynaptic plasticity. Our 
learning rule gives a qualitative explanation of this kind of synaptic modification. 
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I. INTRODUCTION 

What are the mechanisms that modulate learning on 
a neuronal level in animals or humans? This question 
is up to now under debate, but the imagination one has 
for a biological learning rule is that the synaptic weights 
are changed according to a local rule. In the context 
of neural networks local means that only the adjacent 
neurons of a synapse contribute to changes of the synap- 
tic weight. Such a mechanism with respect to synaptic 
strengthening was proposed by Donald Hebb m 1949 
and experimentally found by T. Bliss and T. Lomo In 
a biological terminus Hebbian learning is called long-term 
potentiation (LTP). 

Experimentally as well as theoretically there is a great 
body of investigations aiming to formulate precise con- 
ditions under which learning in neural networks takes 
place. E.g. the influence of the prec ise timing of pre- 
and postsynaptic neuron firing |!5lll8[ or the duration of 
a synaptic change (for a review see \L7\ ) termed short or 
long-term plasticity have been studied extensively. All of 
these analyses share the locality condition proposed by 
Hebb 0. 

But there are also experimental findings which extend 
the traditional view of synaptic plasticity in three im- 
portant points. First Frey and Morris |12] found in the 
hippocampus of rats in vivo that there is a synaptic tag- 
ging mechanism. This mechanism tagges synapses which 
were repeatly involved in information processing within 
a certain time window of up to 1.5h. If one of these 
synapses is restimulated within this time interval then 
LTP is induced. Thus they concluded that there is a 
form of memory for past synaptic activity which leads to 
a kind of summing up past activities to induce LTP. The 
second result is from Otmakhova and Lisman (2l| who 
found an influence of a global dopamin signal on LTP 
and LTD in CA1 in the hippocampus. A third exten- 
tion consists in results about heterosynaptic plasticity. 
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In this form of learning not only the synapse between 
active pre- and postsynaptic neuron is changed but also 
further remote synapses of these neurons [2j ■ Heterosy- 
naptic plasticity is observed for LTP as well as LTD. 

We emphasize that none of these additional findings 
exclude the classical locality condition by Hebb but in- 
volve further contributions to specify learning more pre- 
cisely. 

Based on the exciting results of Frey and Morris 0] 
and Otmakhova and Lisman [2lJ there are theoretical 
investigations on learning dynamics in neural networks 
which interweave the locality condition of Hebb with a 
synaptic tagging mechanism and a global control signal. 
Chialvo and Bak suggested a learning rule which as- 
signs each synapse a boolean scalar valued variable indi- 
cating if the synapse were involved in the last information 
processing or not. This mimics the tagging mechanism. 
Additionally they assigned to the global control signal the 
role of an external reinforcement signal r which forms a 
kind of feedback for the network performance. The rein- 
forcement signal can also take only two different values 
whereas r = 1 corresponds to a right and r = — 1 to a 
wrong output of the network. Synaptic update was only 
allowed if the synapse was activated during the last signal 
processing and the reinforcement signal r signaled a fail- 
ure due to a wrong output of the network. An extention 
of the learning rule of Bak and Chialvo was presented 
by Klemm, Bornholdt and Schuster 0|. They allowed a 
synaptic memory c for each synapse with 9 + 2 G I dis- 
crete values. The dynamics of the synaptic counter c is in 
each time step given by Ct+i — c t — r t for active synapses 
which is restricted from below to 0. If Ct — rt > S occurs 
because the output of the network was wrong and hence 
a reinforcement signal r t = — 1 was fed back into the net- 
work then the corresponding synapse will be depressed by 
a fixed amount 8 and the corresponding synaptic counter 
is set to Ct — rt = O. 

In this paper we present a novel Hebb-like learning 
rule which has a memory for the past failures similar to 
HI El EH However, in contrast to these works we do not 
use synaptic but neuron counters. Due to the use of neu- 
ron counters which can be seen as approximation for the 
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synaptic counters we are lead to a stochastic update con- 
dition instead of a deterministic one for active synapses. 
The obtained stochastic learning rule whose character is 
still local can be interpreted biologically and corresponds 
in a qualitative way to heterosynaptic plasticity |2| . 

The paper is organized as follows. In section ITTI we de- 
scribe a model with which we investigate our stochastic 
learning rule. The learning rule itself is motivated and 
defined in section III Al The result section II I II is subdi- 
vided in three parts. Because a learning rule of a neural 
network is only one part of the entire system we inves- 
tigate the interplay between our stochastic learning rule 
and three different network dynamics and hence their 
influence on the convergence behavior of the neural net- 
work. We compare a winner-take-all IIII Al a softmax 
IIII Bl and a noisy winner-take- all mechanism IIII Cl which 
are all different forms of lateral inhibition. In section 
IIII CI we investigate additionally the influence of a vari- 
able size of a synaptic change S. The results are discussed 
and compared with [l6j. A biological interpretation of 
our stochastic Hebb-like learning rule with respect to het- 
erosynaptic plasticity is given in IIVI The paper ends in 
section Ivl with a summary and conclusions. 

II. THE MODEL 

To investigate the learning dynamics of a neural net- 
work one has to define every item in table [I] 

1. topology of the neural network 

2. neuron model 

3. network dynamics 

4. learning rule 

5. environment 

6. interaction of the TBM with the environment 
TABLE I: Characterization of the entire system 

With Toy-Brain-Model (TBM) we summarize the 
points 1.) to 4.) in tabled The concrete definitions for 
each part are as follows. 1.) Topology of the neural net- 
work: We choose a feedforward network with three layers. 
The layers consist of / input-, H hidden- and O output 
neurons. The neurons of adjacent layers are all to all con- 
nected with synapses «jy £ R + . 2.) Neuron model: The 
neurons are binary Xi £ {0, 1} with i £ {1, . . . , I+H+O}. 
As network dynamics we use three different types to in- 
vestigate the interplay with our learning rule. We use a 
winner-take-all,a softmax |16| and a noisy winner-take- 
all mechanism [l|, |£j . In all three cases only one neuron 
is chosen to be active in the hidden and output layer ac- 
cording to the network dynamics. This corresponds to 
a low activity limit which was in [(| called extremal dy- 
namics. 3. a) Network dynamics (winner-take- all): The 
inner fields of the neurons are calculated by 

all 

hj = ^2wjiXi. (1) 



Here all means all neurons of the preceding layer. The 
active neuron in each layer is simply the one with the 
highest inner field 

imax = argmax(/ii) (2) 

i 

which is set to Xi max = 1. All other neurons are set to 
zero. The winner-take-all mechanism is a purely deter- 
ministic selection mechanism and uniquely determined 
by the inner fields of the neurons. 

3.b) Network dynamics (softmax): The inner fields of 
the neurons are calculated by equation^but the activity 
Xj of the neurons is now obtained by choosing one neuron 
from the probability distribution 

Pj = Z- 1 exp(/3hj) (3) 
Z = ^exp(/%) (4) 

3 

The activity of the chosen neuron is set to one and the 
other neurons are set to zero. The temperature-like pa- 
rameter /3 _1 (<) — (3^ 1 £ R + is held constant. One can 
regulate by (3 the stochastic character of 13141 because for 
(3 = one obtains pj = for the hidden and pj = ^ 
for the output layer for all j in the respective layer which 
corresponds to equal distributions. Whereas (3 — > oo re- 
sults in a deterministic selection of the neuron with the 
highest inner field in each layer which is equivalent to the 
winner-take-all network dynamics. 

3.c) Network dynamics (noisy winner-take-all): The in- 
ner fields of the neurons are again calculated by equation 
^ In this case, the active neuron of each layer Xi max = 1 
is the one with the highest value after the addition of 
noise. 

K = hi + m (5) 

«max = argmax(/i. i ) (6) 

i 

The noise r\i is uniformly drawn out of [0, 77]. Again one 
can by 77 regulate the stochastic character of the selec- 
tion mechanism and obtains for 77—5-0 the deterministic 
winner-take-all mechanism. 

The definition of the learning rule is postponed to sub- 
section lTl Al because this is the central point of this paper. 

5.) Environment: We choose as problem to be learned 
by the network the exclusive-or (XOR) mapping shown 
in table ITTI 



X3 


X2 


Xl 


Xg 


X7 








1 


1 








1 


1 





1 


1 





1 





1 


1 


1 


1 


1 






TABLE II: Exclusive-or (XOR) mapping 

Here x\ is a bias introduced to exclude the case of zero 
activity in the input and hence in all subsequent layers. 
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We have chosen the exclusive-or (XOR) mapping for two 
reasons. First, the problem is not linear separable, it can 
not be learned by a single perceptron but only by a 
multilayer network. However, up to the discovery of the 
back-propagation algorithm of Rumclhardt, Hinton and 
Williams [2j| in the 80's there was no systematic method 
known to adjust the synaptic weights of the neural net- 
work. Still, the problem with the back-propagation al- 
gorithm is that it is not biological plausible because it 
requires a back propagation of an error in the network 
which can not be known y|. For this reason learnin g by 
back-propagation is classified as supervised learning [14| . 
Second, the biologically plausible learning rules proposed 
by an d Q demonstrate that they are able to cope 
with the exclusive-or (XOR) problem. 

We call the exclusive-or (XOR) mapping the environ- 
ment of the neural network in order to keep in mind that 
living beings with brains are always situated in an envi- 
ronment in which they live. The abstract environment in 
which our TBM lives is the exclusive-or (XOR) problem. 
In the context of Artifical Intelligence or Computational 
Neuroscience this is called embodiment 0, |2^ . 

6.) Interaction of the TBM with the environment: Ev- 
ery input pattern in table^is presented with equal prob- 
ability and independent of preceding patterns. 

All the above defined points form the framework of this 
paper. We will now turn to the motivation and definition 
of the learning rule. 



Definition of the stochastic Hebb-like learning 
rule 



The question we would like to answer with respect to 
the proposed learning rules of Q, 0, ^| is: Can the idea 
of a synaptic counter be simplified? Let us consider the 
consequences of the occurrence of the synaptic memory in 
biological living beings. According to there are 10 
synapses in the human and 10 13 synapses in the rat's 
brain but only 10 12 respectively 10 10 neurons. Hence one 
can ask the question if a learning rule based on a neuron 
memory can achieve comparable good results in learning 
as a learning rule based on synaptic counters 0, 0, 0] . 
The insights one can obtain by answering this question 
are twofold. Firstly, by introduction of a learning rule 
based on neuron memory instead of synaptic memory one 
can show that the learning rules proposed by Q, H El 
are not minimal in terms of economical use of resources. 
Secondly, a biological interpretation of the working mech- 
anism of a learning rule with neuron memory could reveal 
novel insights of synaptic plasticity because the starting 
point was a more mathematical one. In the following we 
give a brief sketch of our way to a Hebb-like learning rule 
with neuron memory. 

For the given topology of the neural network defined 
in section [H] as well as for any other network topology 
one can find a linear mapping M with c n = Mc s . Here 
c s S IN and c n £ N N are vectors which components are 



the synaptic and neuron counters. The components of 
the linear mapping M are easily obtained by summing 
up the incoming synaptic counters of a neuron which has 
to be equal to the neuron counter of that neuron. The 
same holds for the sum of the synaptic counters which 
come out of the neuron [{| . 

The crucial point is that we do not use synaptic but 
neuron counters and hence we are interested in the in- 
verse mapping. But for non trivial network topologies 
the number of synapses S and neurons N is different 
which gives a non quadratic matrix M whose inverse is 
not defined by linear algebra. A way out of this is to 
use the Moore-Penrose pseudoinverse [20T l22l | which is 
also defined for non quadratic matrices. Calculations for 
our three layer neural network reveal that in general the 
synaptic counters c%j are not simply the sum of the adja- 
cent neuron counters c; and Cj but also of far remote neu- 
ron counters 9] . This would result in a non local learning 
rule which violates the postulate of Hebb. To avoid this 
non-locality we introduce a stochastic instead of a deter- 
ministic approximation scheme which is described in the 
rest of this section. 

Similar to 0, 0, ^| again only active synapses w%j 
which were involved in the last signal processing step 
can be updated if r = —1 which correspond to a wrong 
network output. But now there is a stochastic condition 



Pcoin < Pc 



(7) 



defined in ^] and ^] below which has to be fulfilled to 
update the synaptic weights by 



-s, 



(8) 



with S(t) = 5 e R+. 

In addition to the network dynamics of the neurons 
there is a dynamics for the neuron counters Ci which is 
defined by 



O, if c l -r>0 
Ci — r,if O > Ci — r > 
0, if > c l - r. 



(9) 



Here O € IN is a threshold, r = ±1 a reinforcement signal 
and Ci a neuron counter. Equation [5] concerns only the 
neuron counters for the active neurons. The other neuron 
counters remain unchanged. 

To obtain the stochastic update condition p C oin < v\ 
one has to follow through the following procedure: 



n rank 



1. Calculate the approximated synaptic counters cy 
of the active synapses by the neuron counters 



(10) 



2. Because of Cj € IN holds for all i e {1,...,N} 
=>• c~ij £ IN. Hence one can assign each approxi- 
mated synaptic counter Cy of each active synapse 
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a probability p T ~ n , which is given by the rank or- 



dering distribution 

P fc rank oc k- T (11) 

k G{l,...,29 + 3} (12) 

r e R+ (13) 



with the mapping k = 20 + 3 — Cy which is moti- 
vated by 3 . 

3. For the distribution P(a;) CO i n we also choose a power 
law 

P(x) coin oc x a (14) 
zG[0,l] (15) 
a G R+ (16) 

from which a probability p C oin is drawn for each 
active synapse. 

Let us compare and clarify the differences between the 
working mechanism of the learning rules proposed by 
[(J ^| and ours. The learning rule by Bak and Chialvo 
updates all active synapses always if the reinforcement 
signal r = — 1. The learning rule by Klemm, Born- 
holdt and Schuster updates an active synapse only if 
r = — 1 and the synaptic counter exceeds a threshold. 
Our stochastic learning rule updates the active synapse 
only if r = — 1 and the condition p co i n < p~ nk is fulfilled 
that means with a certain probability which depends on 
the value of the approximated synaptic counter cfj. 



A. winner-take-all 

The convergence behavior of the winner-take-all mech- 
anism as network dynamics is shown in figure ^ The 
mean ensemble error E(t) is plotted for various O in 
a semi logarithmic plot. E{t) converges rapidly for all 
O G {0, . . . , 5} (only O G {5, 1, 2} (from bottom to top 
are shown) within 1500 time steps to an error below 10~ 2 . 




t 

FIG. 1: Comparison of the mean error E(t) for different values 
of G. The notation "KBS" indicates the learning rule with 
a synaptic counter (T?| . For our learning rule with neuron 
counter E is shown in dependence of O = 5, O = 1 and = 2 
(from bottom to top). The exponents r and a are taken from 
table EH at t = 1500. The size of the ensemble was 10000. 



III. RESULTS 

We investigate the working mechanism of our novel 
stochastic Hebb-like learning rule with respect to the 
learning behavior of a three-layer feedforward network 
for the exclusive-or (XOR) problem and the influence of 
several different parameters, which constitute the entire 
system in table Q] We consider the influences of different 
network dynamics, the temperature-like parameter f3, the 
noise ij and the three parameters O, a and r which deter- 
mine our stochastic Hebb-like learning rule. The follow- 
ing subsections are subdivided according to the network 
dynamics. For all simulations we calculated the mean 
ensemble error E(t) G [0, 1]: 

1 N 

i=l 

e l {t) G {0,1} (18) 

to evaluate the network performance and hence its learn- 
ing behavior. Here ej(i) G {0, 1} indicates if the output 
of network i at time step t was right e, (t) = or wrong 
ei{t) = 1. The ensemble size is given in the respective 
subsections. In all simulations the synaptic weights Wij 
are i.i.d. initialized from the interval [0, 1] and the neuron 
counters Cj are set to 0. 



The best results are obtained for O = 5. A further 
increase of O does not improve the convergence behavior 
(not shown). A comparison with the proposed learn- 
ing rule by 0] indicated by ©kbs m figure ^ reveals 
a stronger dependence of the synaptic memory with re- 
spect to the convergence behavior. This is a hint that our 
learning rule is due to its stochastic character regulated 
by the exponents a and r more flexible with respect to 
different neuron memories which correspond to an eval- 
uation of an individual failure rate as explained in (l(| . 

The best exponents for the neuron memory of our 
stochastic learning rule were obtained by simulations in 
which we investigated systematically the dependence of 
the mean ensemble error E of a and r. We chose the 
exponents from < a,r < 3 in discrete steps of 10 _1 
and let them fixed for the ensemble simulation. The 
corresponding results for O = and = 3 are shown 
in figure [21 In these gray scale contour plots the mean 
ensemble error E is shown at three different time steps 
t G {500, 1000, 1500} which are chosen according to figure 
n Black corresponds to E — 0.0 and white to E = 0.5. 
One can see that there are regions in every plot which dif- 
fer greatly in the respective mean ensemble error. Fur- 
ther more, the structure of the plots is for different O 
recognizable deformed. This can be qualitatively under- 
stood if one starts from O = and a and r values which 
correspond to E m i n . It follows from these assumptions 
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FIG. 2: Mean error E in dependence of r and a at the time steps t = 500, 1000 and t — 1500 for = (top) and = 3 
(bottom). The network dynamics was a winner-take-all mechanism and the ensemble size 10000. 



that the update probability for fulfilling the condition 
Pcoin < p~ nk is optimal for these parameters. An increase 

in leads to a decrease in p~ nk for each hence the 

c ij J 

update condition p co i n < p~ nk is less often fulfilled. This 
can be compensated by an increase of a which reduces 
the average value of p C oin and so increases the probability 
that the update condition is satisfied. The interplay be- 
tween 0, a and r which constitute our stochastic learning 
rule regulates the synaptic update probability and hence 
results in the deformation observed in figure |2 

Table IIIII gives a summary of the simulation results 
for all O £ {0, ... ,5} and shows the best exponents for 
which the mean ensemble error is minimal to the cor- 
responding time steps t 6 {500,1000,1500}. The most 



TABLE III: Minimal mean ensemble error 13 m i n in dependence 
of the exponents a and r and of the neuron counter 0, for 
the time steps t = 500, 1000 and 1500 (left, middle and right 
column). (*) means that in this case there are three pairs of 
exponents for which the mean error is minimal. The other two 
pairs are: r = 2.7/3.0, a = 0.9/1.0. The network dynamics 
was a winner-take-all mechanism and the ensemble size 10000. 



2.0 
2.1 
1.8 
2.4 
2.2 
1.9 



1.7 
2.0 
2.1 
2.3 
2.2 
1.9 



1.4 
2.2 
2.6 
2.3 
2.2 
1.9 



0.4 
0.6 
0.9 
1.7 
2.8 
2.4 



0.3 
0.6 
1.1 
1.7 
3.0 
3.0 



0.3 
0.8 
1.2 
1.7 
3.0 
3.0 



0.110 
0.099 
0.113 
0.122 
0.112 
0.087 



0.024 
0.021 
0.122 
0.032 
0.022 
0.018 



0.005 
0.004 
0.007 
0.009 
0.004 
0.003 



interesting result in tablc lllll is that the best a values are 
all greater than zero. This eliminates a equal distribution 
for P(a;)coin and provides additional justification for the 
power law ansatz in 1141 



B. softmax 

We present now the results for a softmax mechanism as 
network dynamics for which the t emp erature-like param- 
eter was chosen to /3 = 10 (c.f. 16]). The convergence 
behavior of the mean ensemble error E is depicted in fig- 
ure 0] Again learning takes place for all 9. But now one 
can see the formation of two different parameter groups. 
The first group with 6 = {0, 1, 2} shows a slower conver- 
gence than the second one and the curves for the mean 
error E{t) are almost identical for the three 9 values. For 
this only = 1 is shown. The second group consisting 
of0 = {3,4,5}is faster and can be clearly distinguished 
from the first group (0 = 4 not shown is between = 3 
and = 5). This is an effect of (3 — 10 which introduces 
some kind of disorder in the system by the stochastic se- 
lection of an active neuron given by [31 and QJ l n the case 
of a winner-take-all dynamics the selection of an active 
neuron is deterministic and not perturbed by the pres- 
ence of a finite temperature or noise. Hence the average 
time to learn the XOR mapping is lower as in the case 
with finite temperature (3 . This of course can be read 
from the values of the mean ensemble error E(t) for cor- 
responding values by comparing the results in figure 
n and Thus there was no significant difference of the 
convergence time of different values due to the flexibil- 
ity of the learning rule regulated by the exponents a and 
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FIG. 4: Mean ensemble error E versus time t for several values 
of O. The notation "KBS" again indicates that the learning 
rule with synaptic counters was used. The exponents r and 
a are for the respective values of O taken from table ITVI for 
t = 1500. All simulations use the softmax mechanism with 
j3 — 10 as network dynamics. The size of the ensemble was 
10000. 

r. For j3 = 10 the limit of the flexibility of our stochastic 
learning rule is reached. Now there is a O below which 
the time used to average the individual failure rate of 
each synapse is to short for an effective learning process. 
Nevertheless even for O < 2 one can find adequate ex- 
ponents a and r for which learning takes place although 
significantly slower. 

Comparing this with the learning rule of Klemm, Born- 
holdt and Schuster shows again that the best value for the 
synaptic counter 0kbs = 2 converges fastest and reaches 
E = 10 -2 at about t — 2000. Note that in contrast to 



the winner-take-all mechanism shown in figure ^uow we 
can observe intersections between different convergence 
curves however to times when E is already below 10 _1 . 

The exponents used in figure 0] are again obtained by 
simulations for all values out of the interval < a, t < 3 
in discrete steps of 10 _1 . The results analog to figure |3 
are shown in figure |31 One can see that the situation is 
quite similar to figure [21 whereas the structure due to the 
influence of (3 — 10 is now more succinct. A little surpris- 
ing is the fact that the overall structure remained almost 
unchanged. Hence the influence of (3 seems to be almost 
linear at least up to (3 = 10. Table llVl gives a summary 
of all simulation results and shows the best exponents 
for which the mean ensemble error E is minimal to the 
corresponding time steps t e {500, 1000, 1500}. Again 
a is always greater than zero which excludes an equal 
distribution for P C oin- 



TABLE IV: Minimal mean error i? m in in dependence of the 
exponents a and r and of the neuron counter 0, for the time 
steps t = 500, 1000 and 1500. The network dynamic was 
governed by the softmax mechanism with f3 = 10. 



e 


T 


Q 







0.2 


0.0 


0.5 


0.3 


0.2 


0.1 


0.325 


0.250 


0.200 


i 


1.8 


1.0 


0.5 


0.5 


0.6 


0.4 


0.314 


0.246 


0.200 


2 


3.0 


3.0 


3.0 


1.2 


1.1 


1.3 


0.312 


0.237 


0.197 


3 


2.4 


2.2 


2.2 


2.0 


3.0 


1.3 


0.264 


0.138 


0.075 


4 


2.2 


2.2 


2.2 


3.0 


2.9 


3.0 


0.239 


0.121 


0.069 


5 


1.9 


1.9 


1.9 


2.9 


3.0 


2.7 


0.199 


0.088 


0.055 
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C. noisy winner-take-all 

In this subsection we use the noisy winner-take-all 
mechanism [5J as network dynamics. In contrast to 
the preceding results we investigate now the convergence 
behavior over long time scales under the influence of ad- 
ditive noise rj over some orders of magnitude. Further 
more we compare the influence of the size of the synap- 
tic change S. For this we use 6=1 = const, like in the 
preceding simulations and compare this with 5 uniformly 
drawn out of [0, 20]. Again we did the simulations for all 
values 9 € {0, . . . , 5} but show only the results for 6 = 
and = 5 which give the significant differences. 

The evaluation of the mean ensemble error E is here 
a little different. We simulate as long as it takes the 
network to reach a stationary fix point and then average 
over the next T m time steps. In addition we average over 
a (small) ensemble N . 

1 N T 

E =-r, rV V eAt) (19) 

N(T m + l)^ ^ ty ' K ' 

v m I i=l t=T-T m 

«5i(i) £{0,1} (20) 

Here N = 100, T = 100000 and T m = 10000. 

The first results are shown in figure [S] Here the mean 
ensemble error E is investigated in dependence of the 
noise r\ and the exponent r of the rank ordering dis- 
tribution. The exponent of the distribution P(cc) C oin 
was fixed to a = 1.5 under regard of the preceding re- 
sults. In figure UJ as well as in figure comparable © 
values are arranged in rows with O = in the top and 
= 5 in the bottom row. The left columns give re- 
sults for 5 = 1 = const. ~ 0(1) and the right for 
6 G [0,20] ~ O(10). From the scales of the synaptic 
changes the steps occurring in figure [S] and at noise 
values of order r\ ~ 10° respectively r/ ~ 10 1 are crudely 
explained. 

In the first row of figure [5] one can clearly see that the 
additional degree of freedom in form of a variable size of 
synaptic alterations 6 leads to a significant improvement 
for t < 2.0 of one scale of order in r\. This effect can 
be gradually reduced by increasing O |9j from to 5. 
The lower part of figure |3] gives the final results for this 
procedure for 9 = 5. Here the influence of the additional 
degree of freedom is not only completely eliminated but 
one obtains for 6 = 1 even better results in the range 
1.0 < t < 2.0. 

This can be understood taking into consideration that 
learning is effective if the synaptic weights are changed 
as fast as possible and as often as necessary. On the 
first sting this looks like a contradiction but new paths 
in the network which connect input with output neurons 
correctly can be found only if synapses are changed. On 
the other hand by a synaptic change old paths in the 
neural network which are already learned correctly can 
be destroyed and hence unlearned. For this the three 
parameters constituting our learning rule O, a and r have 
to be chosen so that the probability to fulfill the update 




FIG. 5: Mean ensemble error E in dependence of the noise 
7? and the exponent r. Here a = 1.5 was chosen fixed. Left: 
6 = 1. Right: S e [0,20]. Top figure: 6 = 0. Bottom figure: 
9 = 5. 

condition p CO i n < Pp: nk is in accord with the motto given 
above. 

Let us start at 5 = 1 and O = with a qualitative ex- 
planation. Here low t values give either equal or slightly 
better results. This indicates that the probability to ful- 
fill the update condition p C oin < p~ nk is more adequate 
for low t values than for high. For low r values the up- 
date probability is less than for high r values because of 
eauation llll under the natural assumption that the neural 
network did not learn the XOR problem up to a certain 
time step which implies high values of the approximated 
synaptic counters cTj. This holds for every O value 9] 
and hence also for O = 5 shown in the bottom left figure 
in □ 

When O is gradually increased from O = to O = 5 
there is an additional effect on the performance. The 
update probability crosses the threshold from too high 
to too low values. This can be seen by increasing O 
for fixed r values which causes a decrease of the update 
probability because of equation ^| In the bottom left 
figure [S] one can see that for O = 5 and r < 2.0 this leads 
to an improvement of the performance of the network. 
This is in accordance with the explanation for the top left 
figure. For r > 2.0 one would expect worse results than 
for t < 2.0 which is true but better results for O = 5 
than for O = 0. This however does not hold because 
for these parameters the update probability crossed the 
critical threshold and hence is too small which prevents 
an efficient learning. 

If the update probability is greater or smaller than this 
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FIG. 6: Mean ensemble error E in dependence of the noise 
n and the exponent r. Here r = 2.0 was chosen fixed. Left: 
5 = 1. Right: S e [0,20]. Top figure: 9 = 0. Bottom figure: 
9 = 5. 



critical threshold can also be seen from the steps occur- 
ring in figure |SJ A step occurring at r\ ~ 0(1) is for 
5 = 1 ~ O(l) obviously. A shift to higher or lower noise 
values indicates a decreased (increased) update probabil- 
ity because averaging over the past network outcomes is 
prolonged (reduced). This follows from the fact that the 
average time over a time series which is contaminated by 
noise to detect a signal has to be longer the higher the 
influence of the noise is. 

The influence of a variable synaptic change 8 consists 
in an enhancement of the effects described above. An 
update probability which is too high is stronger punished 
due to higher mean synaptic changes and results in a 
higher probability to destroy already learned paths in the 
network. This can be seen in both of the right figures in 

El 

Figure shows the same results for the mean ensemble 
error E as figure [S] but now a is variable and r = 2.0 is 
constant. The occurring effects are again explained by 
the influence of the three parameters 0, a and r on the 
update probability of the condition p CO i n < n~ nk . The 
most interesting result here is the strong dependence of 
the mean ensemble error E of for 5=1. For a value 
of 9 = the exclusive-or (XOR) problem can not be 
learned completely for r ~ 3.0 even for very low noise 
values but only for low r values. The situation is almost 
completely changed for = 5. Now the performance for 
t ~ 0.0 is worse than for higher r values. This reflects 
too high (low) update probabilities for = and r ~ 3.0 
(6 = 5 and r - 0.0). 



IV. BIOLOGICAL INTERPRETATION 

In recent years there is an increasing number of ex- 
perimental results which investigate heterosynaptic plas- 
ticity. In contrast to homosynaptic plasticity where only 
the synapse between active pre- and postsynaptic neuron 
is changed in form of either long-term depression (LTD) 
or long-term potentiation (LTP) heterosynaptic plastic- 
ity concerns also further remote synapses of the pre- and 
postsynaptic neuron. This scenario is schematically de- 
picted in figure {7\ There we suppose neuron 5 and 6 
were active and induced (homo-) synaptic plasticity on 
the synapse which is enclosed by these neurons. In ad- 
dition to this form of plasticity Fitzsimonds et al. [ll| 
found in cultured hippocampal neurons that the induc- 
tion of LTD is also accompanied by back propagation of 
depression in the dendrite tree of the presynaptic neuron. 
Further more, depression also propagates laterally in the 
pre- and postsynaptic neuron. Similar results hold for 
the propagation of LTP, see for a review. 

The correspondence to our learning rule follows imme- 
diately from the working principle of our neuron coun- 
ters. In figure [7| the neuron counters are shown as q, 
i E {1, . . . , 8} for each neuron in the schematic network. 
According to our learning rule there is a communication 




FIG. 7: Schematic depiction of the interplay of the neuron 
counters and their influence on the approximated synaptic 
counters, a are the neuron counters and Cij are the approxi- 
mated synaptic counters. 

between the neuron counters of adjacent neurons. This 
communication leads to the formation of the approxi- 
mated synaptic counters £y . From this, one can see that 
an alteration of the neuron counters C5 and cq leads not 
only to an alteration of C56 but also of all approximated 
synaptic counters Cfcs, £5;, c m & and c§ n with k £ {2,3}, 
I E {6,7}, m E {4,5} and n E {8}. In biological terms 
Cfe5 corresponds to backpropagation, £5; to presynaptic 
lateral and c m e to postsynaptic lateral spread of LTD. 
Interestingly the term c@ n which would correspond to 
forward propagated postsynaptic LTD was not experi- 
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mentally found up to now |2(. 

A biological explanation for the cellular mechanisms of 
these findings are currently under investigation. Fitzsi- 
monds et. al. suggest the existence of retrograde 

signaling from the post- to the presynaptic neuron which 
could produce a secondary cytoplasmic factor for back- 
propagation and presynaptic lateral spread of LTD. On 
the postsynaptic side lateral spread of LTD could be ex- 
plained similarly under the assumption that there is a 
blocking mechanism for the cytoplasmic factor which pre- 
vents forward propagated LTD. They are of the opinion 
that extracellular diffusible factors are of minor impor- 
tance. In an abstract sense the approximated synaptic 
counters of our learning rule could be interpreted as an 
intracellular mechanism and not as an extracellular one. 
This would be consistent with the suggestions of 

The future will show if further experiments confirm 
or reject the non-existence of forward propagated LTD. 
From a theoretical point of view and based on the as- 
sumptions made in this paper such a symmetry breaking 
mechanism occurring during the propagation of heterosy- 
naptic LTD would be more elaborated than our stochas- 
tic Hebb-like learning rule. 

V. CONCLUSIONS 

In this article we presented a novel stochastic Hebb-like 
learning rule for neural networks and demonstrated its 
working mechanism exemplary in learning the exclusive- 
or (XOR) problem in a three-layer network. We inves- 
tigated the convergence behavior by extensive numerical 
simulations in dependence of three different network dy- 
namics which correspond all to biological forms of lateral 
inhibition. We found in all cases, parameter configura- 
tions for ©, the length of the neuron memory, a, the 
exponent of the coin distribution and r, the exponent 
of the rank ordering distribution, which constitute the 
Hebb-like learning rule, to obtain not only a solution to 
the exclusive-or (XOR) problem but comparably well re- 
sults to the learning rule recently proposed by Klcmm, 
Bornholdt and Schuster ^(|. Comparably well means 
that for the exclusive-or (XOR) problem Okbs = 2 was 
always better than any parameter configuration {©, a, t} 
for our learning rule, but for Okbs / 2 there are a lot 
of parameter configurations {©, a, r} which result in a 



faster convergence in dependence of the time scale. In 
this point we agree with 0, 0] where they take the opin- 
ion that natural systems try to solve problems satisficing 
and not optimally in a mathematical sense because of the 
lack of information biological systems are faced due to 
their inherent open character. In this respect our model 
consists of a large variety of parameters which work sim- 
ilar well without the need to find the very best param- 
eter configuration. This parameter configuration can of 
course be found as shown in section Ull Al and 1111 Bl But 
that does not mean that other parameter configurations 
does not work at all. Our aim was to establish a Hebb- 
like learning rule which is very flexible with respect to 
special choices of the three parameters {0, a, r}. 

Moreover our learning rule works comparably well to 
16] if one keeps in mind that our learning rule uses much 
less parameters. Because the number of neurons is al- 
ways (much) less then the number of synapses the same 
holds for the respective numbers of synaptic and neuron 
counters which were used in the learning rules. 

An interesting implication of our learning rule and its 
inherent stochastic character is that it offers a very sim- 
ple qualitatively explanation of heterosynaptic plasticity 
which is observed experimentally. In addition to the ex- 
perimentally observed back-propagation, pre- and post- 
synaptic lateral spread of long-term depression (LTD) 
our learning rule predicts forward propagated postsynap- 
tic LTD for reasons of a symmetric communication be- 
tween adjacent neurons. As far as we know there is no 
theoretical explanation of that phenomenon so far. 

In further investigations we will demonstrate that our 
learning rule is not restricted to a multilayer network 
topology but works also in a class of recurrent networks 
constructed by an algorithm of Watts and Strogatz p6l | 
when learning the problem of timing Moreover, it 

would be of interest to enlighten the power law ansatz for 
the rank ordering II II and coin distribution 1141 which was 
motivated by in a more general context of stochastic 
optimization methods for rule-based systems. 
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